An in silico approach to study the role of epitope order in the multi-epitope-based peptide (MEBP) vaccine design

With different countries facing multiple waves, with some SARS-CoV-2 variants more deadly and virulent, the COVID-19 pandemic is becoming more dangerous by the day and the world is facing an even more dreadful extended pandemic with exponential positive cases and increasing death rates. There is an urgent need for more efficient and faster methods of vaccine development against SARS-CoV-2. Compared to experimental protocols, the opportunities to innovate are very high in immunoinformatics/in silico approaches, especially with the recent adoption of structural bioinformatics in peptide vaccine design. In recent times, multi-epitope-based peptide vaccine candidates (MEBPVCs) have shown extraordinarily high humoral and cellular responses to immunization. Most of the publications claim that respective reported MEBPVC(s) assembled using a set of in silico predicted epitopes, to be the computationally validated potent vaccine candidate(s) ready for experimental validation. However, in this article, for a given set of predicted epitopes, it is shown that the published MEBPVC is one among the many possible variants and there is high likelihood of finding more potent MEBPVCs than the published candidates. To test the same, a methodology is developed where novel MEBP variants are derived by changing the epitope order of the published MEBPVC. Further, to overcome the limitations of current qualitative methods of assessment of MEBPVC, to enable quantitative comparison and ranking for the discovery of more potent MEBPVCs, novel predictors, Percent Epitope Accessibility (PEA), Receptor specific MEBP vaccine potency (RMVP), MEBP vaccine potency (MVP) are introduced. The MEBP variants indeed showed varied MVP scores indicating varied immunogenicity. Further, the MEBP variants with IDs, SPVC_446 and SPVC_537, had the highest MVP scores indicating these variants to be more potent MEBPVCs than the published MEBPVC and hence should be preferred candidates for immediate experimental testing and validation. The method enables quicker selection and high throughput experimental validation of vaccine candidates. This study also opens the opportunity to develop new software tools for designing more potent MEBPVCs in less time.

. The common components, i.e., adjuvants, linkers and predicted epitopes specific to various targets of SARS-CoV-2 are used in a MEBPV construction. The data provides the components which are commonly used in the design of MEBPVCs with relevant references.

Results
The purpose of a vaccine (MEBPVC in the current context) is to elicit a strong response from the host immune system against a disease (COVID-19) releasing various neutralizing antibodies which continue to stay in the body to protect the host from any repeat infection thenceforth. Toll-like receptors (TLRs) are the common receptors that interact with the immunogen (MEBPVC) and trigger the downstream response and release of neutralizing antibodies. From the informatics point of view, to correlate the properties, to derive relationship with immunogenicity and changed epitope positions of the MEBPVC, the necessary data was generated at four levels: (a) sequence level, (b) 3D structure level, (c) receptor-ligand interaction level and (d) dosage versus immune response level. The data generated thus, are analysed to understand if the changed epitope positions influenced the various properties and eventually the immunogenicity. Prior to the analysis, to establish the fact that the change in the order/positions of the epitopes in a MEBPVC changes the immunogenicity, one has to first assess the diversity among the MEBP variants. To assess the same, pairwise alignment, multiple sequence alignment (MSA), and structure alignment were performed on the ten variants. The results are discussed below. Table 2 provides the percent sequence identity (Lower triangle) and root mean square deviation (RMSD) (upper triangle) as a ready reference.
Further, at the 7th position, almost identical epitopes, VLSFELLHA, VVVLSFELL were present in SPVC_214 and SPVC_383 respectively which clearly justifies the 73.51% sequence identity (Table 2). However, their RMSD is one among the highest, i.e. 16.60Å (Table 2) indicating very dissimilar structures which is contrary to the notion in homology modeling that is, high sequence identity (> 30%) indicates high structural similarity and same function.
The pair with the least sequence identity (39.13%) is SPVC_214 and SPVC_387 (Fig. 1b). Their RMSD is 4.92 Å indicating reasonable structural similarity though sequences are not very similar. The pair with the highest RMSD i.e.17.3 Å are REF_SEQ and SPVC_214. With such high RMSD, it is commonly expected to have very Table 2. For ready comparison between the MEBP variants, the data is presented as percent sequence identity (lower triangle: below 100.00 diagonal) and root mean square deviation (RMSD) (upper triangle: above 100.00 diagonal). Lighter shades of a color indicate lower RMSD (higher structural similarity) and lower sequence identity whereas darker shades of the colors indicate higher RMSD (lower structural similarity) and higher sequence identity. www.nature.com/scientificreports/ little sequence similarity between the two sequences. However, they have a sequence identity of around 65%. Further analysis revealed that the same epitopes were seen at 2nd, 3rd, and 4th positions in the two MEBP constructs (REF_SEQ and SPVC_214) ( Immunological and biophysical properties of the MEBP dataset. Immunological properties. Antigenicity, allergenicity. The antigenicity scores from Vaxijen 2.0 server indicate that all the MEBPVCs are probable antigens with a range between 0.62 to 0.78 (Table 4). The sequence, SPVC_446 has the highest antigenicity(0.78) and SPVC_214 (0.62) has the lowest antigenicity. All the allergenicity scores predicted using AllerTop v2.0 and AllergenFP v1.0 servers indicate that all the variants are non-allergens hence the allergen column was omitted from (Table 4).
Biophysical properties. Stability. The stability scores of MEBPVC variants fall between 73.17 to 76.97 which is the first parameter used as a filter. As a rule, all the MEBPVC variants must be predicted as stable which is found to be true for all the variants in the dataset. SPVC_214 is predicted to have the lowest and SPVC_537, to have the highest stability scores. The variations in the stability scores, though not very dispersed, indicate that change in the order of epitopes influenced the stability of the vaccine construct.
Solubility. The next biophysical property considered is solubility. Less soluble proteins are a major concern since the proteins synthesized may not fold to the right structure and hence lose the activity and function and are observed to precipitate out or form inclusion bodies leading to various disease states 33 . The solubility scores range from 0 to 1.0 where > 0.5 score indicates soluble and < 0.5 indicates insoluble peptide. In our case, all the MEBP variants had solubility scores ranging from 0.73 to 0.83 indicating all are soluble. The SPVC_446 variant has the lowest solubility and the SPVC_32 variant has the highest solubility. These solubility scores also indicate that the order of epitopes in the MEBPVC is important and crucial in the design of a good vaccine candidate.   www.nature.com/scientificreports/ Accessibility. Solvent accessibility is an important feature which, in the current context, has direct implications in eliciting the immune response in the host. The higher the epitope accessibility the more immunogenic the vaccine candidate. For the MEBP variants, the percent epitope accessibility (PEA) ranged between 35 to 37. The variant SPVC_387 has the lowest accessibility and SPVC_383 has the highest accessibility.
Disorder. In our MEBP variants disorder ranges between 0.15 to 0.16 and hence all variants are considered ordered. The MEBPVC sequence SPVC_357 has high disorder and SPVC_387 has low disorder among the variants. Low disorder is considered favorable for better vaccine design.
Aggregation. The predicted aggregation propensities ranged from 2.7 to 3.3 with lower values considered favorable. The sequences REF_SEQ and SPVC_387 have the highest aggregation propensity and SPVC_32, SPVC_565, SPVC_537 have the lowest propensity.

Comparative docking analysis of MEBP variants. The properties compared in the previous sections
were sequence-based. The analysis proved that the order of the epitopes indeed influenced the stability, solubility, accessibility, disorder, and aggregation properties. To make the analysis more complete and comprehensive, the following sections explore the docking and MD simulation studies using the ab initio modeled 3D structures of the MEBP variants. In the previous section, the MEBP models (variants) and their RMS deviations were discussed. Among the family of Toll-Like Receptors (TLRs), the innate immune sensors, TLR8 and TLR4 are the most common receptors interacting with antigens/immunogens triggering an immune response from the host system to fight the immunogen. TLR8 plays an important role in the generation of effective immune responses in humans. TLR8 also senses the single-stranded RNA of viruses in the endosome and is predominantly expressed in the lungs. TLR4, plays an important role in the regulation of myocardial function, fibroblast activation, and acute inflammation by immune cells. Both the receptors are implicated in COVID-19. TLR4, is one of the 'fatedeciding' regulators of immunity and COVID-19 immunopathogenesis 34 . Table 5 shows the docking scores, binding affinities, minimization energies for both the receptors (TLR8 and TLR4) in complex with the MEBP variants.
ZRank Score is used to assess the quality of protein-protein docking. A more negative ZRank score indicates better docking. As can be seen from the tables, the ZRank scores are varying from (−100) to (−150) for the available MEBP variants. With TLR8, SPVC_537 has a better ZRank Score but an unfavorable binding affinity (11.92). With TLR4, SPVC_565 is showing favorable ZRank Score and acceptable binding affinity. It is interesting to note that there are at least four different MEBP variants having better ZRank scores when compared to REF_SEQ within a dataset size of ten variants. This definitely proves that changing the order of epitopes influences the 3D structure, which in turn influences the binding with immune machinery (TLRs) indicating the effectiveness of the immunogen (MEBPVC). Fig. 2 shows the docked MEBP variants with TLR4.

Simulation analysis of MEBPVC variants complexed with TLRs.
The molecular dynamic simulation production run for 100 ns yielded a center projected trajectory in which the MEBP vaccine complexes were centered along the system in order to calculate the relative RMSD and RMSF for the MEBP vaccine complexes. The calculated RMSD for all the MEBP-TLR complexes were interpreted as maximum deviation data points and the average deviation data points to arrive at a holistic conclusion. Normalizing the data points and considering average deviation data points will provide us with much more stable MEBP-TLR complexes. Considering only the maximum deviation data points and neglecting the rest of the stable data points of the MEBP vaccine complex simulation over a 100 ns span will not be feasible for a simulation of this larger time span. The RMSF calculations were also performed similarly (RMSD calculation). The obtained total RMSD/RMSF calculations of ten MEBP vaccine candidates are represented in Table 5. The maximum and average RMSD of the REF_SEQ vaccine candidate is 0.53 nm and 0.45 nm for the TLR4 complex and 0.62 nm and 0.48 nm for the TLR8 complex.
The maximum and average RMSF of the REF_SEQ is 0.54 nm and 0.19 nm for the TLR4 complex and 0.71 nm and 0.18 nm for the TLR8 complex respectively. Considering REF_SEQ complex as the reference, all the other MEBP vaccine complexes were screened accordingly. Combinatorial approach of considering both the vaccine potency score and stability will help us arrive at the most potent of the MEBP vaccine candidates. The combined MVP is calculated with our scoring algorithm based on various physicochemical parameters. The MVP score is observed to be relatively higher for SPVC_446 and SPVC_537 with a score of 8.858 and 8.899 respectively, when compared to the REF_SEQ with an MVP score of 8.595. Both SPVC_446 and SPVC_537 prove to be promising vaccine candidates with high potency scores. The maximum/average RMSD of SPVC_446 is 0.4 nm/0. 39  www.nature.com/scientificreports/ plots. The RMSD plots show that the SPVCs are more stable when in complex with both TLR4 and TLR8. It can be observed that REF_SEQ is relatively less stable when compared to SPVC_446 and SPVC_537. RMSF plots of SPVC_446 and SPVC_537 indicate better interactions and higher stability, when in complex with TLR4 (Fig.3e). However, all the three SPVCs show similar high stability when in complex with TLR8 without a clear distinction as seen in TLR4 complexes (Fig. 3f).
Taking the stability of the complexes of MEBP vaccine candidates into consideration to derive a conclusion to select the best of the vaccine candidates, the SPVC_446 proves to be the best among the vaccine candidates. In spite of SPVC_537 having higher MVP score, the complex falls short in the stability parameter which is an important property to be looked into for biological activity.
Dosage versus immune response simulation analysis. As the last of the in silico tasks, we performed dosage vs immune response simulation for the ten vaccine constructs (MEBP variants) using the C-IMMSIM server with the same objective, to see if the variants trigger different responses than REF_SEQ, if so will the response indicate more potency or less. Two simulation experiments were done: (a) with adjuvant and HIS-tag and (b) without adjuvant and HIS-tag. The MEBPVC variants had all the parameters within the optimal and recommended ranges for them to be considered as a potent vaccine candidate individually with the exception of SPVC_387 (Supp. 2 Fig. S19) (without adjuvants + HIS-tag). A common observation has been that a repeated exposure led to an overall increase in the immune response and a decrease in the antigenic load (Supp. 2 Fig. S1-20).
Few observations are presented here. When compared to REF_SEQ, all other variants trigger strong antibody (especially IgM or IgM + IgG) responses with their 1st dose (exposure). Of all the variants, SPVC_214 is seen to trigger the highest titers of IgG + IgM. Of all the constructs, REF_SEQ triggers the weakest. The titers reach ~650,000 counts per ml for SPVC_214 and others but only ~580,000 counts per ml for REF_SEQ.
It is interesting to note that variants without adjuvants + HIS-tag seem to trigger more strongly than with adjuvants + HIS-tag. The antibody titers reach 90,000 counts per ml without adjuvants + HIS-tag as compared to only 20,000 counts per ml with adjuvants + HIS-tag on exposure to 1st dose of SPVC_214. The IgM + IgG titers reach ~760,000 counts per ml on the last (third) exposure. A similar trend is seen for all other variants as well, where without adjuvants + HIS-tag are triggering a better immune response. Figs. 4a-c and 5a-c show the level of immunoglobulins (with and without adjuvant + HIS-tag) at two different doses (1st, 2nd, and 3rd).
It is also observed that some variants such as SPVC_383 trigger high TH cell populations per state with counts reaching around 8200 per mm 3 with an average antibody (IgG + IgM) response of around 570,000 counts. When www.nature.com/scientificreports/ compared to others, the same variant also shows the best B cell population counts (800 per mm 3 ). According to the MEBP dose versus IFN-γ response simulation, it is interesting to note that variants with adjuvant + HIS-tag and without adjuvant + HIS-tag have totally different trends as seen in Fig. 6a,b. For example, SPVC_446 (with adjuvant+HIS-tag) triggers the highest concentration of IFN-γ and REF_SEQ has the lowest concentration of IFN-γ in the first dose. These above observations clearly demonstrate that change in the epitope order in a MEBP vaccine candidate influences immunogenicity.
Ranking the ten variants and identifying the most potent MEBPVC. Table 6 summarizes the receptor specific scores (RMVPs) and final MVP score for each variant. From the MVP score, SPVC_446 is predicted to be the most potent MEBPVC followed by SPVC_537. As can be seen, the least potent is REF_SEQ clearly proving that better and more potent MEBPVCs are possible by changing the epitope order and that epitope order influences immunogenicity (details about the normalized data are provided in Supp. 3).

Discussion
Vaccine development typically takes 10 years. In the pre-COVID-19 world, the fastest vaccine development time recorded was four years against mumps. It is no small feat to develop a vaccine against COVID-19 in a span of 9-10 months and vaccinate nearly 1.5 billion people. This shall be the new benchmark and reference for future vaccine development strategies and preparedness for future pandemics. This has become possible because of global cooperation for vaccine research and distribution 35 .
The current COVID-19 vaccines listed under EUL have respective advantages and disadvantages 36 . The major disadvantage of Pfizer/BioNtech Comirnaty vaccine is its stringent cold chain requirement though it has shown very good titers 37 . The adenovector-based vaccines show relatively less effective neutralizing antibody response 38 . Inactivated vaccines seem to show inferior immunogenicity and low T Cell response, though have shown lower adverse reactions 39 . Similar to inactivated vaccines, the protein subunit vaccines show low immunogenicity. However, the possible advantages and potential benefits attracted the pharma companies to invest in protein subunit platforms. More than 30% of the total COVID-19 vaccine candidates undergoing trials are protein subunit vaccines with 65% under preclinical trials. Peptide-based vaccines have many unique benefits such as  www.nature.com/scientificreports/ in place. MEBPVC has further advantages in designing multi-epitope, multi-target, multi-copy, multi-disease, and custom-size (molecular weight) vaccine constructs. The MEBP subunit vaccine platforms are in the initial phases of development.
Applications of in silico approach to design a MEBP vaccine is one of the pragmatic opportunities that can reduce the time in developing vaccines and reach the market in shorter time duration. The in silico methodology presented in this article shall further reduce the time to identify potential new vaccine candidates under the protein subunit vaccine platform.
It is known that peptide vaccines are weakly immunogenic. Considering the advantages offered by peptidebased vaccines that include MEBP vaccines, it is worth addressing the peptide vaccine-specific issues, where the major issue seems to be lower immunogenicity. This limitation is being effectively addressed through (a) combining with adjuvants such as β-defensin 2, HSP70, HBD-2, Matrix-M1, nanoparticles, (b) altering the size (molecular weight), and others. Adjuvants have shown to significantly boost immunogenicity but have not matched the current platforms such as RNA, adenovirus vector, and inactivated virus-based platforms [40][41][42] .
It is the fundamental phenomenon that changes in the amino acid order change the structure and function, giving the clue that the earlier reported MEBPVC (REF_SEQ) could have variants if the epitope order changed. A set of ten variants were generated manually to explore if the variants thus generated have altered immunogenicity.
The variants were analyzed at the sequence, structure, interaction and dosage levels. In homology modeling, it is a common rule of thumb that for any two sequences, if the sequence identity is > 30%, it is assumed that their 3D structures shall be similar and likely to have identical function 43 . Further, it is also believed that with www.nature.com/scientificreports/ the increase in the sequence identity, the structural similarity also increases, i.e., RMSD decreases. However, it is interesting to note that the variants show deviations from the rule of thumb. As can be seen from Table 2 and  Table 3, there are many pairs that show deviations. There have been studies that proved that 3D structures of 100% identical sequences were having natural conformations that have RMSDs as large as 24Å 44 . There have been studies where an all-α helix protein (Protein G) was engineered and transformed into an all-β protein(Protein Rop) by changing only 50% of the amino acid composition [45][46][47] . There is a need to experimentally verify the MEBP variant 3D structures through experimental structure determination techniques such as X-Ray Crystallography, NMR and or Electron microscopy. The analysis, indeed, strongly suggests that changing the epitope order in MEBPVC changes the structure and hence the various associated properties resulting in the alteration of immunogenicity of the variant. Hence more potent MEBPVCs can be identified if the protocol described in this article is followed. The step to generate a dataset of shuffled variants is key in the analysis as this step enables the comparative study which otherwise has not been reported till now in MEBP vaccine design protocols. MVP score has also been developed which provides an opportunity to rank and identify the most potent MEBPVC from the dataset. Further, the data generated becomes the necessary input for developing better scoring schemes and algorithms.
The MVP scores were calculated through summing the individual RMVP scores of TLR4/8. The RMVP scores had two influencers: positive and negative. The positive influencers majorly indicate the physicochemical parameters, stability parameters and binding affinity of the protein complex. It also gives an overall insight about the functional ability of the MEBP-TLR complex. Likewise negative influencers were also taken into consideration: www.nature.com/scientificreports/ disorderness, aggregation, RMSD and RMSF. The negative influencers directly incorporate the instability aspect of the MEBP-TLR complex to the MVP score. The RMSD and RMSF in depth analysis through molecular dynamic simulation provided data about the interaction stability and residue fluctuation (OPLS all atom forces fields and SPCE water model were used). SPVC_446 and SPVC_537 showed the highest MVP score and hence, are the most potential MEBPVCs from among the variant dataset. The normalized positive influencer scores of these two MEBPVCs were higher as it had positive influence over the stability and binding interactions. The normalized negative influencer scores of these two MEBPVCs were lower as it inversely influenced the disorderness and binding affinity of the MEBPVCs. The RMSD and RMSF tend to be higher for highly disordered or misfolded proteins 48 . Hence the RMSD and RMSF were considered as the negative influencers while calculating the MVP score. Lower the RMSD and RMSF of the MEBPVCs, higher the stability. The lower RMSD and RMSF indicates higher stability and lower misfolding candidates i.e. SPVC_446 and SPVC_537.   49 . According to the MEBP dose versus IFN-γ response simulation, SPVC_446 triggers the highest concentration of IFN-γ and REF_SEQ has the lowest concentration of IFN-γ in the first dose. All the other higher responsive SPVCs were rejected based on the lower MVP score and stability. Similarly, the SPVCs that triggered higher antibody responses were also ignored, and more stable complexes were chosen to avoid possibility of protein aggregation and complications caused through misfolded proteins 50 .

NK) cells, which is activated through cell damage signals from infected or damaged cells through Cytotoxic T cells (CTLs). IFN-γ plays a major role in activation of macrophages, dendritic cells and T cells to curb the further downstream infection
It is pertinent to mention that the number of variants was restricted to ten out of a possible 3,628,800 (10! possibilities) unique variants of similar length. The peptide length was restricted to 183 amino acids. The scope of the work was to test if the immunogenicity changes with the epitope order in the MEBPVC. For some parameters, it was also observed that the changes in the biophysical and immune parameters were not significant. This can be attributed to the small sample size, restriction on the length of the construct, and manual shuffling of epitopes for such small changes. The next challenges are to work with a bigger dataset of variants, optimize the parameters influencing the MEBP vaccine design, gain deeper insights into the mechanism(s) behind the mutations and their virulence and improvise the MVP score incorporating the knowledge and the decision making systems such as AI and ML 51 .

Materials and methods
In general, a MEBP sequence is constructed with three broad components (peptide sequence patterns), namely, Linkers (for example AAY, GPGPG, EAAAK), Adjuvants for example β-defensin 2 and HSP70, and predicted target specific MHCI and MHC II Epitopes (oligopeptides with size ranging from 8AA to 20AA). The MEBP sequence (REF_SEQ) published earlier is taken as a reference for this study 25 . Table 4 lists the properties of REF_SEQ MEBPVC.

Construction of MEBP and its variants.
The construction of the REF_SEQ is described elsewhere 25 .
Using the REF_SEQ another nine MEBPVCs were generated by shuffling the epitope-linker set at the predesignated positions of REF_SEQ (Fig. 7).
The generated MEBP variants are given unique IDs following the format as: SPVC_NNN, where SPVC stands for shuffled peptide vaccine candidate and NNN stands for a unique number. The ten sequences are provided as supplementary data (Supp. 1 file).

Prediction of immunological and biophysical properties of MEBPVCs. The following relevant
immuno-(antigenicity and allergenicity) and biophysical-(protein stability index, surface/solvent accessibility, solubility, inherent intrinsic disorder, aggrescan, hydrophobicity) properties were predicted using the online/offline web servers/tools to compare and develop a rationale for identifying the more potent ones.
Antigenicity. Antigenicity is the extent to which the host immune system responds to the antigen (foreign body) triggering both humoral and cellular responses. VaxiJen2.0 server (http:// www. ddg-pharm fac. net/ vaxij en/ VaxiJ en/ VaxiJ en. html) was used to predict the antigenicity of the MEBP variants 55 . The result from VaxiJen 2.0 categorizes the peptide input into either Probable ANTIGEN or Probable NON-ANTIGEN. Only those variants which were categorized as Probable ANTIGEN were selected for further processing.
Allergenicity. Allergenicity is the extent to which an immunogen or antigen induces allergic reactions in the host system resulting in discomfort and or inconvenience or any allergic conditions such as asthma, diarrhea, skin rashes, and others. AllerTop v2.0 server (https:// www. ddg-pharm fac. net/ Aller TOP/) was used to predict the allergenicity of the MEBP variants 56 . The output from AllerTop v2.0 indicates if the input sequence is an allergen or not using the following restricted text values i.e. PROBABLE ALLERGEN or PROBABLE NON-ALLERGEN. Only those variants were selected for further processing which were categorized PROBABLE NON-ALLERGEN.
Peptide/protein stability index. Protein stability is an important property especially to understand the structure-function and activity relationships of a protein. The EXPASY ProtParam server (https:// web. expasy. org/ protp aram/) was used for predicting the Instability index of the nine MEBPVCs 57 . The instability index was modified to the stability index by subtracting the score from 100. A score of more than 60 henceforth indicates the input protein to have better stability.
Solvent accessibility. Solvent accessibility is an important feature for understanding and interpreting the structure-activity relationship 58 . The solvent accessibility or the surface exposed residues provide data that helps in various predictions such as protein-protein interactions, receptor-ligand interactions, drug designing, protein folding, and others. Scratch Protein Predictor (http:// scrat ch. prote omics. ics. uci. edu) was used to predict the solvent accessibility of the variants 59 . The output from Scratch Protein Predictor contained residue level accessibility with higher values indicating higher accessibility and vice versa. In the context of peptide vaccine design, higher accessibility and especially the residues with high accessibility in the epitope regions of MEBP is preferred since to elicit an immune response, the epitopes in the vaccine construct should be accessible, be exposed, and be on the surface. The accessibility predictor gives a string output, equal to the length of the sequence, with 'e' (for each exposed/accessible residue) and 'b'(for each buried residue). A more meaningful accessibility score in the context of MEBP vaccine design is percent epitope accessibility (PEA) defined and calculated as per the formula given below: The higher PEA value is considered favorable.
Solubility. Solubility of protein is an important biophysical property that depends on the amino acid composition and the 3D structure. Solubility influences the production of a protein and its half-life in the cell. Less soluble proteins are a major concern since the proteins synthesized may precipitate out or form inclusion bodies which lead to various disease states. SoluProt v1.0 server (https:// losch midt. chemi. muni. cz/ solup rot/) was used to predict the solubility of all the MEBP variants 60 . For each input sequence, i.e. an MEBP variant in the current context, the SoluProt v1.0 server gives a score in the range of 0-1.0 where > 0.5 score indicates soluble and < 0.5 indicates insoluble peptide. The vaccine constructs with higher solubility (> 0.5) were selected for further processing. www.nature.com/scientificreports/ each MEBP variant was calculated using the residue-wise disorder. The variants with the low average disorder are considered for further analysis.
Protein aggregation. Protein aggregation is a biological process in which protein/peptide subunits instead of forming regular and functional assemblages, misfold, aggregate (intra-or extracellularly), and precipitate. Protein aggregation is one of the important phenomena implicated in diseases such as Parkinson's, Alzheimer's, and prion diseases 62 . To predict the aggregation for the variants, we used the AGGRESCAN server (http:// bioinf. uab. es/ aggre scan/) 63,64 . Variants with lower predicted aggregation were considered for further analysis.
Hydrophobicity. Hydrophobicity is the 'water-hating/avoiding/repelling' property of molecules. Hydrophobic amino acids tend to fold and shrink together to minimize contact with the solvent water or hydrophilic surroundings. The hydrophobic effect is a well-known important property to understand the 3D structure of a protein 65 . Hydrophobic interactions are an important driving force in protein folding hence the overall 3D structure. The shape determines the function of the protein. In the context of MEBPs which are the potential immunogens, higher hydrophobicity indicates more globularity, rigid 3D structure, and associated accessible surface residues. Kyte and Doolittle's method was used to generate the Hydrophobicity values by using EMBOSS pepwindow server (https:// www. ebi. ac. uk/ Tools/ seqst ats/ emboss_ pepwi ndow/) 66 . The outputs were screened and filtered for higher hydrophobicity and used for further processing. Structural studies. Ab initio 3D structure prediction. A local installation of the I-TASSER-suite was used to predict the tertiary structures of the MEBP variants. The suite uses the Local Meta-Threading server (LOMET) to find out the suitable template structure from the input sequence. In the second step, the server performs the template-based fragment assembly simulation to create a full structure and the final step is that it will produce the top five predicted models with TM-score (Quantitative assessment of similarity between protein structures) and C-score (confidence score for estimating the quality of predicted models).
Molecular docking and binding affinity calculations. The protein-protein docking was performed in the Discovery studio (Z_DOCK module) 67 . The receptors (TLR4 and TLR8) were downloaded from RCSB with PDBIDs:4G8A, 3W3M, and the MEBP variants were used as ligands. The docking resulted in ~2000 clusters with each cluster containing ~70 to 80 poses of ligands (MEBP variants) with the receptor (TLR4 and TLR8). The top poses with the highest Z_RANK-SCORE were selected to find R_DOCK-SCORES (Refined docked protein). The R_DOCK uses CHARMm energy to optimize docked poses produced by the Z_DOCK module. The top hits from the R_DOCK module were selected for further simulation studies. The binding affinities were calculated using the MMGBSA module from Discovery studio 68 .
Molecular dynamics simulation. The MEBP variants were scored and assorted based on the favorability of each property it represented individually. Among hundreds of variants, we have selected the top 10 MEBP for finer investigation to produce a single potent MEBP vaccine candidate based on RMVP score (MEBP vaccine potency). The molecular dynamics simulation helps us study the stability and interaction of the MEBP-TLR complex in an ion based solvent system. All the top-scored MEBP vaccine candidates were subjected to molecular dynamics simulation using GROMACS v.2021 and topologies were generated using the OPLS force field. The system was solvated in a cubic box conformation using SPCE water model, energy minimized until the steepest descent energy, i.e. atoms are realigned to reduce the maximum net forces on them. The minimized atoms exert least forces on each atom and therefore serves as a favourable start point for molecular dynamics simulations. Pressurized and increasing temperature conditions were implemented for 100 ps. The molecular dynamics simulation was produced for a span of 100 ns over the centered projected trajectory. RMSD and RMSF were calculated using the gmx_rms option with reference to the MEBP Vaccine complex to TLR4 and TLR8 for all the ten MEBP TLR4/8 complexes.
Implementation of a scoring scheme for ranking and discovering the best MEBPVC from the library of variants.. The values obtained for the ten MEBPVC variants for the above discussed properties were grouped into (a) positive influencers (Stability, Accessibility, Solubility, Hydrophobicity, Antigenicity, ZrankScore, Binding Affinity (MMGBSA) b) negative influencers (Disorder, Aggregation, RMSD and RMSF). All the values of a property were normalized using normalization by averaging i.e.
where x is the original value of the property, n is the normalized value, i is the index of the variant, x is the average, all the variants.
x n,i = x i x

Conclusions
In pursuit to design and or discover more potent MEBPVCs, given the list of predicted epitopes, a methodology to generate variants and three predictors, percent epitope accessibility (PEA), receptor-specific vaccine potency (RMVP) and MEBP vaccine potency (MVP) scores have been introduced to quantify and enable an opportunity for the development of efficient MEBP vaccine design platform. This enabled not only ranking but also identifying the best MEBPVC. Thus, the results prove that the reported MEBPVC (REF_SEQ) is not the most potent candidate after all. In this article, only the order of epitopes has been used for generating variants. However, other parameters such as length of epitopes, length of the sequence, copy numbers, and or other parameters should also be explored in the future to design and discover more potent MEBPVCs. In conclusion, our in silico analysis and results indeed prove that changes in the position or order of the epitopes change the properties of the MEBPVC. Henceforth, the dataset generation method, PEA, RMVP, and MVP score should enable generating novel MEBPVCs and may be adopted in all MEBPVC design pursuits. The method enables the design and discovery of the computationally validated most immunogenic MEBPVC, each having a unique epitope order. Experimental validation and verification has no substitute; hence, the computationally validated vaccine constructs, with IDs, SPVC_446 and SPVC_537, need to be validated through in vitro, and in vivo experimental studies. The experimental validation provides important insights and inputs for improvising and developing a more efficient and more reliable scoring function and the improved versions of the software.